Activation Function Analysis

Experiment Controls

Training Parameters

Learning Rate 0.0010

Network Depth 2 Layers

Dataset Configuration

Activation Functions

ReLU

Rectified Linear Unit

f(x)=max(0,x)

Prevents vanishing gradient for positive inputs. Computationally efficient but can suffer from "dying neurons".

Sigmoid

Logistic Function

f(x)=1/(1+e⁻ˣ)

Outputs between 0 and 1. Useful for probability outputs but suffers from vanishing gradients.

Tanh

Hyperbolic Tangent

f(x)=tanh(x)

Zero-centered output between -1 and 1. Better than sigmoid for hidden layers but still has gradient issues.

Linear

Identity Function

f(x)=x

No transformation applied. Used as a baseline comparison and for output layers in regression.

Function Performance

Epoch: 0

Linear

MSE: 0.000000

Gradient: 0.0000

ReLU

MSE: 0.000000

Gradient: 0.0000

Sigmoid

MSE: 0.000000

Gradient: 0.0000

Tanh

MSE: 0.000000

Gradient: 0.0000

Gradient Analysis

Average gradient magnitude in the first hidden layer

Linear

ReLU

Sigmoid

Tanh

Technical Details

This visualization demonstrates how different activation functions perform when training neural networks on various datasets. The interface allows you to:

Compare ReLU, Sigmoid, Tanh, and Linear activation functions
Adjust network depth and learning rate parameters
Visualize gradient flow and vanishing gradient problems
Test performance on different dataset complexities

Observe how ReLU maintains strong gradients while Sigmoid and Tanh suffer from vanishing gradients, especially in deeper networks.